Cross-Genre Age and Gender Identification in Social Media

نویسندگان

  • Anam Zahid
  • Aadarsh Sampath
  • Anindya Dey
  • Golnoosh Farnadi
چکیده

This paper gives a brief description on the methods adopted for the task of author-profiling as part of the competition PAN 2016 [1]. Author profiling is the task of predicting the author’s age and gender from his/her writing. In this paper, we follow a two-level ensemble approach to tackle the cross-genre author profiling task where training documents and testing documents are from different genres. We use the softvoting approach to build the classification ensemble. To include various feature sets, we first train logistic regression models using the extracted word n-gram, character n-gram, and part-of-speech n-gram features for each genre. We then ensemble single-genre predictive models trained on the blog, social media and Twitter data sources, to build our multi-genre ensemble approach. The experimental results indicate that our approach performs well in both single-genre and cross-genre author profiling tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Age and Gender Identification using Stacking for Classification

This paper presents our approach of identifying the profile of an unknown user based on the activities of known users. The aim of author profiling task of PAN@CLEF 2016 is cross-genre identification of the gender and age of an unknown user. This means training the system using the behavior of different users from one social media platform and identifying the profile of other user on some differ...

متن کامل

Overview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations

This overview presents the framework and the results of the Author Profiling task at PAN 2016. The objective was to predict age and gender from a cross-genre perspective. For this purpose a corpus from Twitter has been provided for training, and different corpora from social media, blogs, essays, and reviews have been provided for evaluation. Altogether, the approaches of 22 participants were e...

متن کامل

Author Profiling with Doc2vec Neural Network-Based Document Embeddings

To determine author demographics of texts in social media such as Twitter, blogs, and reviews, we use doc2vec document embeddings to train a logistic regression classifier. We experimented with age and gender identification on the PAN author profiling 2014–2016 corpora under both singleand cross-genre conditions. We show that under certain settings the neural network-based features outperform t...

متن کامل

GronUP: Groningen User Profiling

We train an SVM linear model on tweets to perform user profiling, in terms of gender and age, on non-Twitter social media data, whose actual nature is unknown to us at developing time. We choose features that we deem appropriate to profile authors on social media in general, and which do not characterise the specifics of Twitter data too closely. Additionally, we pay specific attention to engin...

متن کامل

Profiling Microblog Authors using Concreteness and Sentiment - Know-Center at PAN 2016 Author Profiling

The PAN 2016 author profiling task is a supervised classification problem on cross-genre documents (tweets, blog and social media posts). Our system makes use of concreteness, sentiment and syntactic information present in the documents. We train a random forest model to identify gender and age of a document’s author. We report the evaluation results received by the shared task.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016